Structured tabular data exist across nearly all fields. Reasoning task over these data aims to answer questions or determine the truthiness of hypothesis sentences by understanding the semantic meaning of a table. While previous works have devoted significant efforts to the tabular reasoning task, they always assume there are sufficient labeled data. However, constructing reasoning samples over tables (and related text) is labor-intensive, especially when the reasoning process is complex. When labeled data is insufficient, the performance of models will suffer an unendurable decline. In this paper, we propose a unified framework for unsupervised complex tabular reasoning (UCTR), which generates sufficient and diverse synthetic data with complex logic for tabular reasoning tasks, assuming no human-annotated data at all. We first utilize a random sampling strategy to collect diverse programs of different types and execute them on tables based on a "Program-Executor" module. To bridge the gap between the programs and natural language sentences, we design a powerful "NL-Generator" module to generate natural language sentences with complex logic from these programs. Since a table often occurs with its surrounding texts, we further propose novel "Table-to-Text" and "Text-to-Table" operators to handle joint table-text reasoning scenarios. This way, we can adequately exploit the unlabeled table resources to obtain a well-performed reasoning model under an unsupervised setting. Our experiments cover different tasks (question answering and fact verification) and different domains (general and specific), showing that our unsupervised methods can achieve at most 93% performance compared to supervised models. We also find that it can substantially boost the supervised performance in low-resourced domains as a data augmentation technique. Our code is available at https://github.com/leezythu/UCTR.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings. It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech. To evaluate the quality of this parallel speech, we train bilingual speech-to-speech translation models on mined data only and establish extensive baseline results on EuroParl-ST, VoxPopuli and FLEURS test sets. Enabled by the multilinguality of SpeechMatrix, we also explore multilingual speech-to-speech translation, a topic which was addressed by few other works. We also demonstrate that model pre-training and sparse scaling using Mixture-of-Experts bring large gains to translation performance. The mined data and models are freely available.
translated by 谷歌翻译
对少量语义分割(FSS)的研究引起了极大的关注,目的是在查询图像中仅给出目标类别的少数注释的支持图像。这项具有挑战性的任务的关键是通过利用查询和支持图像之间的细粒度相关性来充分利用支持图像中的信息。但是,大多数现有方法要么将支持信息压缩为几个班级原型,要么在像素级别上使用的部分支持信息(例如,唯一的前景),从而导致不可忽略的信息损失。在本文中,我们提出了密集的像素,互源和支持的注意力加权面膜聚合(DCAMA),其中前景和背景支持信息都是通过配对查询和支持特征之间的多级像素的相关性通过多级像素的相关性充分利用的。 DCAMA在变压器体系结构中以缩放点产生的关注实现,将每个查询像素视为令牌,计算其与所有支持像素的相似之处,并预测其分割标签是所有支持像素标签的添加剂聚集 - 相似之处。基于DCAMA的唯一公式,我们进一步提出了对N-shot分割的有效有效的一通推断,其中所有支持图像的像素立即为掩模聚集收集。实验表明,我们的DCAMA在Pascal-5i,Coco-20i和FSS-1000的标准FSS基准上显着提高了最先进的状态以前的最佳记录。烧烤研究还验证了设计dcama。
translated by 谷歌翻译
由于其效率,一声神经架构搜索(NAS)已被广泛用于发现架构。但是,先前的研究表明,由于架构之间的操作参数过度共享(即大共享范围),架构的一声绩效估计可能与他们在独立培训中的表现没有很好的相关性。因此,最近的方法构建了更高参数化的超级链,以降低共享程度。但是这些改进的方法引入了大量额外的参数,因此在培训成本和排名质量之间导致不良的权衡。为了减轻上述问题,我们建议将课程学习应用于共享范围(接近),以有效地训练超级网。具体而言,我们在一开始就以很大的共享范围(简单的课程)训练超网,并逐渐降低了超级网的共享程度(更难的课程)。为了支持这种培训策略,我们设计了一个新颖的超级网(闭合性),该超级网(CLESENET)将参数从操作中解耦,以实现灵活的共享方案和可调节的共享范围。广泛的实验表明,与其他一击的超级网络相比,Close可以在不同的计算预算限制中获得更好的排名质量,并且在与各种搜索策略结合使用时能够发现出色的体系结构。代码可从https://github.com/walkerning/aw_nas获得。
translated by 谷歌翻译
最近,已广泛研究了基于深度学习的方法,以进行可变形的图像注册任务。但是,大多数努力将复合图像表示形式直接映射到通过卷积神经网络的空间转换,而忽略了其捕获空间对应关系的有限能力。另一方面,变压器可以更好地表征与注意机制的空间关系,其远程依赖性可能对注册任务有害,在这种情况下,距离太大的体素不太可能是相应的对。在这项研究中,我们提出了一个新型的变形器模块,以及用于可变形图像配准任务的多尺度框架。变形器模块旨在通过将位移矢量预测作为几个碱基的加权总和来促进从图像表示到空间转换的映射。借助多尺度框架以粗略的方式预测位移字段,与传统和基于学习的方法相比,可以实现卓越的性能。进行了两个公共数据集的全面实验,以证明所提出的变形器模块以及多规模框架的有效性。
translated by 谷歌翻译
尽管深度强化学习(DRL)取得了巨大的成功,但由于过渡和观察的内在不确定性,它可能遇到灾难性的失败。大多数现有的安全加固学习方法只能处理过渡干扰或观察障碍,因为这两种干扰影响了代理的不同部分。此外,受欢迎的最坏情况可能会导致过度悲观的政策。为了解决这些问题,我们首先从理论上证明了在过渡干扰和观察障碍下的性能降解取决于一个新颖的价值函数范围(VFR),这与最佳状态和最坏状态之间的价值函数的间隙相对应。基于分析,我们采用有条件的价值风险(CVAR)作为对风险的评估,并提出了一种新颖的强化学习算法的CVAR-Proximal-Policy-oftimization(CPPO),该算法通过保持风险敏感的约束优化问题形式化。它的CVAR在给定的阈值下。实验结果表明,CPPO获得了更高的累积奖励,并且在Mujoco中一系列连续控制任务上的观察和过渡干扰更加强大。
translated by 谷歌翻译
我们提出了用于将Swin变压器缩放到3亿参数的技术,并使其能够使用高达1,536美元的图像培训1,536美元。通过缩放容量和分辨率,Swin变压器在四个代表视觉基准上设置新记录:84.0%的Top-1在Imagenet-V2图像分类准确度,63.1 / 54.4盒/掩模地图上的Coco对象检测,59.9 Miou在Ade20K语义细分中,在动力学-400视频动作分类上的86.8%的前1个精度。我们的技术通常适用于缩放视觉模型,这尚未广泛探索为NLP语言模型,部分原因是培训和应用中的困难:1)视觉模型经常面临规模的不稳定问题,2)许多下游愿景任务需要高分辨率图像或窗口,并且目前尚不清楚如何有效地将模型在低分辨率上预先培训到更高分辨率。当图像分辨率高时,GPU存储器消耗也是一个问题。为了解决这些问题,我们提出了几种技术,通过使用Swin Transformer作为案例研究来说明:1)归一化技术和缩放的余弦注意力,提高大视觉模型的稳定性; 2)一种日志间隔的连续位置偏置技术,以有效地将在低分辨率图像和窗口预先训练的模型转移到其更高分辨率的对应物。此外,我们分享了我们的关键实施细节,导致GPU内存消耗的大量节省,从而使得用常规GPU培训大型视觉模型可行。使用这些技术和自我监督的预训练,我们成功培训了强大的3B往返变压器模型,并有效地将其转移到涉及高分辨率图像或窗口的各种视觉任务,实现了各种最先进的准确性基准。
translated by 谷歌翻译
在预先建立的3D环境图中,高精度摄像头重新定位技术是许多任务的基础,例如增强现实,机器人技术和自动驾驶。近几十年来,基于点的视觉重新定位方法已经发达了,但在某些不足的情况下不足。在本文中,我们设计了一条完整的管道,用于使用点和线的相机姿势完善,其中包含创新设计的生产线提取CNN,名为VLSE,线匹配和姿势优化方法。我们采用新颖的线表示,并根据堆叠的沙漏网络自定义混合卷积块,以检测图像上的准确稳定的线路功能。然后,我们采用基于几何的策略,使用表极约束和再投影过滤获得精确的2D-3D线对应关系。构建了以下点线关节成本函数,以通过基于纯点的本地化的初始粗姿势优化相机姿势。在开放数据集(即线框上的线提取器)上进行了足够的实验,在INLOC DUC1和DUC2上的定位性能,以确认我们的点线关节姿势优化方法的有效性。
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译